fix(run): emit session_error for session failures by byapparov · Pull Request #81 · aictrl-dev/cli

byapparov · 2026-06-13T13:02:33Z

Summary

emit structured session_error when the primary session publishes session.error
classify stored CLI NamedError objects without dropping data.message / data.statusCode
add coverage for stored auth/API error classification and v1 schema emission

Refs #63.

Validation

cd packages/cli && bun test test/cli/classify-session-error.test.ts test/cli/run-schema-v1.test.ts test/cli/run-session-error-exit.test.ts --timeout 30000
cd packages/cli && bun run typecheck

aictrl-dev · 2026-06-17T06:22:46Z

I'll start by exploring the repository and understanding the PR changes.Let me get the PR diff and the full file contents.Let me get the surrounding context in run.ts and understand the emit function signature.

aictrl-dev · 2026-06-22T06:49:56Z

Code review

Verdict: Looks good — only minor / nit comments below. · 🔴 0 · 🟠 0 · 🟡 2 · ⚪ 1 · 0/3 resolved

🟡 packages/cli/src/cli/cmd/run.ts:575-579 — session_error wiring lacks a behavioral test
🟡 packages/cli/src/cli/cmd/run.ts:575-581 — session_error + error both emitted; confirm no double-count
⚪ packages/cli/test/cli/run-schema-v1.test.ts:49 — Brittle hardcoded 1400-char window in structural test

🤖 Fix all 3 open findings with your agent

Fix the following code review findings on aictrl-dev/cli PR #81 (head branch).
Run the relevant tests/linters after each change.

1. packages/cli/src/cli/cmd/run.ts:575-579 — session_error wiring lacks a behavioral test
   Detail: The new `emit("session_error", …)` emission on the primary `session.error` path is verified only by `run-schema-v1.test.ts`, which asserts that the *source text* of run.ts contains certain substrings (`classifySessionError(props.error)`, `emit("session_error"`, `emit("error"`). That grep-style assertion passes even if `classifySessionError` is not imported, if the emit key is mistyped, or if the surrounding control flow is broken — it never drives a real `session.error` and observes the emitted payload. Consider a behavioral test (stub/observe `emit` and assert the actual `session_error` event is produced with the right reason/code/message) so this wiring can't silently regress.
2. packages/cli/src/cli/cmd/run.ts:575-581 — session_error + error both emitted; confirm no double-count
   Detail: For primary `session.error`s, both the new structured `session_error` event and the existing generic `error` event are now emitted (and `UI.error` may still print afterward). If any consumer/observer subscribes to both channels, the same failure can be counted or reported twice. If dual emission is intentional (structured channel vs. legacy channel), a short comment noting that would help future readers; otherwise consider gating the legacy `error`/`UI.error` path when `session_error` is handled.
3. packages/cli/test/cli/run-schema-v1.test.ts:49 — Brittle hardcoded 1400-char window in structural test
   Detail: `source.slice(idx, idx + 1400)` anchors the assertion window to a magic number. If the `session.error` block grows past 1400 chars (more conditions, longer payloads), a target string can slide outside the window and the `toContain` checks either miss the intended region or force the magic number to be bumped on every change. Prefer anchoring the window to a stable boundary (e.g. slice to the matching closing brace / next event handler) or assert against the full source so the test degrades predictably.

📋 Out-of-diff findings (3)

Sev	Location	Finding
🟡	`packages/cli/src/cli/cmd/run.ts:575-579`	session_error wiring lacks a behavioral test
🟡	`packages/cli/src/cli/cmd/run.ts:575-581`	session_error + error both emitted; confirm no double-count
⚪	`packages/cli/test/cli/run-schema-v1.test.ts:49`	Brittle hardcoded 1400-char window in structural test

Reviewed 4 files · 0 inline · view all 3 findings ↗

aictrl · AI code review for fast-moving teams · aictrl.dev

- Add behavioral test (run-session-error-emit.test.ts): spawns the CLI with --format json against a failing model and asserts the session_error JSON event is emitted with a classified reason/message. Closes the source-text-grep gap flagged by the bot. - Fix brittle 1400-char window in run-schema-v1.test.ts: anchor the assertion window to the next event handler boundary instead of a magic character count, matching the pattern used in run-session-error-exit.test.ts. - Add clarifying comment in run.ts session.error handler explaining that session_error (structured/classified) and error (raw/legacy) are intentionally distinct channels serving different consumers, not a double-count bug. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ

byapparov · 2026-06-22T07:01:35Z

Review response — PR #81

Triaged 3 findings (2x 🟡 minor, 1x ⚪ nit). All addressed in commit 58e48e420.

Issues addressed (pushed to this PR)

session_error wiring lacks a behavioral test — packages/cli/test/cli/run-session-error-emit.test.ts (new file): spawns the CLI with --format json against a nonexistent model, collects stdout JSON lines, and asserts a session_error event is present with a valid classified reason and non-empty message. This drives the real emission path end-to-end; source-text grep can't catch a mistyped event key or broken control flow. (commit 58e48e420)
Brittle hardcoded 1400-char window — packages/cli/test/cli/run-schema-v1.test.ts:49: replaced source.slice(idx, idx + 1400) with the boundary-anchored pattern already used in run-session-error-exit.test.ts — slices to the next if (event.type === " handler so the window scales with the block instead of requiring a magic number bump on every addition. (commit 58e48e420)

Review claims verified false (no change needed)

"session_error + error both emitted; confirm no double-count" — verified false as a bug. The two events are intentionally distinct channels: session_error (line 575) fires only for the primary session (props.sessionID === sessionID) and carries classified reason/code/message for structured telemetry and CI consumers; error (line 581) fires for both primary and child sessions and carries the raw error object as the legacy observable. They serve different consumers with different payloads. Added a clarifying comment in run.ts to make this explicit for future readers (commit 58e48e420).

aictrl-dev · 2026-06-22T08:16:23Z

Code review

Verdict: Address the major findings before merging. · 🔴 0 · 🟠 1 · 🟡 1 · ⚪ 1 · 0/3 resolved

🟡 packages/cli/src/cli/cmd/run.ts:580 — session_error can be emitted more than once for one failure
⚪ packages/cli/test/cli/run-session-error-emit.test.ts:24 — Unexplained --conditions=browser on a CLI subprocess
🟠 packages/cli/test/cli/run-session-error-emit.test.ts:31 — Behavioral test doesn't cover the new code path

🤖 Fix all 3 open findings with your agent

Fix the following code review findings on aictrl-dev/cli PR #81 (head branch).
Run the relevant tests/linters after each change.

1. packages/cli/src/cli/cmd/run.ts:580 — session_error can be emitted more than once for one failure
   Detail: This PR adds a third `session_error` emitter — the in-loop `session.error` handler for the primary session — alongside two pre-existing ones: `loopDone.catch(...)` and the `promptResult` rejection handler (both already call classifySessionError + emit("session_error")). In a mid-run failure it is plausible that a `session.error` event is observed (firing this new emit) AND `promptResult`/`loop()` later rejects (firing an existing emit), producing two `session_error` JSON lines for a single logical failure. Telemetry/CI consumers that count or branch on `session_error` could double-count or act twice.

Consider guarding all three sites with a shared `let sessionErrorEmitted = false` (emit + set on first occurrence), or document the expected at-most-N semantics for consumers.
   Suggested fix: Introduce a single `let sessionErrorEmitted = false` flag in the run scope and short-circuit all three `session_error` emit sites behind it, so at most one `session_error` is emitted per run regardless of which failure paths fire.
2. packages/cli/test/cli/run-session-error-emit.test.ts:24 — Unexplained --conditions=browser on a CLI subprocess
   Detail: The spawned CLI is run under `bun run --conditions=browser`. For a CLI that depends on Node/Bun runtime APIs (process, fs, child_process, real network calls), forcing the `browser` export condition risks resolving browser-shimmed module exports — potentially a mocked/no-op SDK client that does not exercise the real session.error path the test intends to cover, letting the assertion pass for the wrong reason. If the flag is intentional (to pin a specific module resolution), a one-line comment explaining why would help reviewers; if not, drop it so the subprocess runs under default (node) conditions.
   Suggested fix: Drop `--conditions=browser` unless it is intentionally required; if required, add a comment explaining why the browser condition is needed for this CLI subprocess.
3. packages/cli/test/cli/run-session-error-emit.test.ts:31 — Behavioral test doesn't cover the new code path
   Detail: The test is described as a behavioral regression test for #81, but its failure trigger (`--model nonexistent-provider/nonexistent-model`) almost certainly exercises a PRE-EXISTING emitter rather than the new code this PR adds.

The new code (run.ts, in the `session.error` event handler) emits `session_error` when a `session.error` EVENT is observed during the loop — i.e. a session that started successfully and failed mid-execution. A nonexistent model fails earlier: per the base code's own comment in run.ts's `promptResult` rejection handler, "If SessionPrompt.prompt() rejects BEFORE it enters its internal loop() (e.g., Session.get() fails, model not found)" — i.e. it rejects `promptResult`, and that rejection handler ALREADY emits `session_error` (classifySessionError + emit), predating this PR.

Consequence: this test passes on the base branch unchanged; removing or breaking the new run.ts `emit("session_error", …)` would leave it green. It does not actually guard the #81 behavior and gives false confidence.

Fix: trigger an in-execution session.error event (a session that starts then fails mid-run, e.g. a valid provider + revoked/empty key → auth failure during the run) rather than a pre-prompt model-resolution failure, and assert the event originates from the in-loop path. A focused unit test that feeds a `session.error` event into the loop would also cover it deterministically.
   Suggested fix: Drive a real in-execution failure (valid provider + empty/revoked key so the session starts and fails mid-run with a 401/ProviderAuthError), or add a unit test that injects a `session.error` event into the loop and asserts `session_error` is emitted — so the assertion fails if the new in-loop emit is removed.

📋 Out-of-diff findings (3)

Sev	Location	Finding
🟡	`packages/cli/src/cli/cmd/run.ts:580`	session_error can be emitted more than once for one failure
⚪	`packages/cli/test/cli/run-session-error-emit.test.ts:24`	Unexplained --conditions=browser on a CLI subprocess
🟠	`packages/cli/test/cli/run-session-error-emit.test.ts:31`	Behavioral test doesn't cover the new code path

Reviewed 5 files · 0 inline · view all 3 findings ↗

aictrl · AI code review for fast-moving teams · aictrl.dev

Finding 1 (TRUE/FIX): replace subprocess test that used nonexistent-model (exercised pre-existing promptResult rejection handler, not the new in-loop emit) with a mock-server --attach test that injects a session.error event mid-stream. Test now fails if the new in-loop emit("session_error",…) is removed. Finding 2 (TRUE/FIX): drop --conditions=browser from the test subprocess. The flag was cargo-culted from the dev script; no packages in this repo export browser-specific conditions, so it was harmless but unexplained and potentially misleading. Removed without replacement. Finding 3 (TRUE/FIX): add a shared `let sessionErrorEmitted = false` flag in execute() scope. All three session_error emit sites (in-loop session.error handler, loopDone.catch, promptResult rejection handler) now short-circuit on the flag so at most one session_error is emitted per run regardless of which failure paths fire. New test also asserts count === 1.

byapparov · 2026-06-22T09:41:04Z

Round-2 code review verdict — PR #81

Analyzed at: 2026-06-22T09:41:03Z
Commit reviewed: 58e48e4
Fix commit: b09fec9
Round: 2

#	Severity	Location	Finding	Verdict	Action
1	🟠	`test/cli/run-session-error-emit.test.ts:31`	Behavioral test doesn't cover the new code path	TRUE	FIX
2	⚪	`test/cli/run-session-error-emit.test.ts:24`	Unexplained `--conditions=browser`	TRUE	FIX
3	🟡	`run.ts:~580`	`session_error` can be emitted more than once	TRUE	FIX

Finding 1 — TRUE / FIX

Evidence: The round-1 subprocess test used --model nonexistent-provider/nonexistent-model, which fails at model resolution before the event loop starts. This triggers the pre-existing promptResult rejection handler (run.ts:774), not the new in-loop session.error handler (run.ts:580). Confirmed: removing the new in-loop emit("session_error",…) left the round-1 test green.

Fix: Replaced the subprocess test with a mock-server --attach test. The mock HTTP server serves an SSE stream containing a session.error event with ProviderAuthError followed by a session.status idle event. The CLI receives the session.error event mid-loop, the new in-loop handler fires, and session_error is emitted to stdout. In the --attach path, promptResult stays Promise.resolve() so the promptResult rejection handler cannot fire — the new handler is the only emit site reachable in this scenario.

Verification: Temporarily replaced the in-loop emit("session_error",…) with a no-op — test went red with expect(0).toBeGreaterThan(0). Restored emit — test is green.

Finding 2 — TRUE / FIX

Evidence: --conditions=browser was cargo-culted from package.json's dev script via the round-1 test. No package in this repo exports browser-specific module conditions, so it was harmless but unexplained. In the context of a CLI subprocess test it is actively confusing: reviewers (and the bot) correctly flag that forcing the browser condition could resolve shimmed/no-op exports.

Fix: Removed --conditions=browser from the test subprocess args. The new test runs under default (node) conditions. No behaviour change since no browser conditions are defined.

Finding 3 — TRUE / FIX

Evidence: Three independent session_error emit sites exist in execute():

In-loop session.error event handler (line 580) — fires on session.error SSE event
loopDone.catch() (line 727) — fires if loop() itself throws
promptResult rejection handler (line 774) — fires if sdk.session.prompt() rejects before entering the loop

Sites 1 and 2 can co-fire: a session.error event fires site 1, then if the event stream subsequently throws (e.g. network drop), loop() rejects and fires site 2. Sites 1 and 3 are mutually exclusive in the --attach path (promptResult never rejects) but can co-fire in the local bootstrap path if the prompt call rejects AND a session.error event was already observed.

Fix: Added let sessionErrorEmitted = false in execute() scope. All three emit sites check-and-set the flag before emitting, so at most one session_error is emitted per run. The new test asserts sessionErrorEvents.length === 1.

{
  "schemaVersion": "1",
  "sourcePR": 81,
  "repo": "aictrl-dev/cli",
  "round": 2,
  "analyzedAt": "2026-06-22T09:41:03Z",
  "headSha": "58e48e420",
  "fixSha": "b09fec9d3",
  "findings": [
    { "id": "f1", "severity": "orange", "verdict": "TRUE", "action": "FIX", "location": "test/cli/run-session-error-emit.test.ts:31", "summary": "Behavioral test didn't cover in-loop session.error path; replaced with mock-server attach test" },
    { "id": "f2", "severity": "white", "verdict": "TRUE", "action": "FIX", "location": "test/cli/run-session-error-emit.test.ts:24", "summary": "Removed unexplained --conditions=browser from test subprocess" },
    { "id": "f3", "severity": "yellow", "verdict": "TRUE", "action": "FIX", "location": "run.ts:580", "summary": "Added sessionErrorEmitted guard flag; all 3 emit sites short-circuit so at most one session_error per run" }
  ]
}

fix(run): emit session_error for session failures

9969372

byapparov self-assigned this Jun 13, 2026

byapparov added this to the Enterprise Observability milestone Jun 22, 2026

aictrl-dev Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread packages/cli/src/cli/cmd/run.ts Outdated

Comment thread packages/cli/src/cli/cmd/run.ts Outdated

Comment thread packages/cli/test/cli/run-schema-v1.test.ts Outdated

aictrl-dev Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread packages/cli/test/cli/run-session-error-emit.test.ts Outdated

Comment thread packages/cli/src/cli/cmd/run.ts Outdated

Comment thread packages/cli/test/cli/run-session-error-emit.test.ts Outdated

byapparov merged commit 5d84d0b into main Jun 22, 2026
5 checks passed

byapparov deleted the fix/session-error-json branch June 22, 2026 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(run): emit session_error for session failures#81

fix(run): emit session_error for session failures#81
byapparov merged 3 commits into
mainfrom
fix/session-error-json

byapparov commented Jun 13, 2026

Uh oh!

aictrl-dev Bot commented Jun 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aictrl-dev Bot commented Jun 22, 2026

Uh oh!

byapparov commented Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aictrl-dev Bot commented Jun 22, 2026

Uh oh!

byapparov commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

byapparov commented Jun 13, 2026

Summary

Validation

Uh oh!

aictrl-dev Bot commented Jun 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aictrl-dev Bot commented Jun 22, 2026

Code review

Uh oh!

byapparov commented Jun 22, 2026

Review response — PR #81

Issues addressed (pushed to this PR)

Review claims verified false (no change needed)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aictrl-dev Bot commented Jun 22, 2026

Code review

Uh oh!

byapparov commented Jun 22, 2026

Round-2 code review verdict — PR #81

Finding 1 — TRUE / FIX

Finding 2 — TRUE / FIX

Finding 3 — TRUE / FIX

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant